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Abstract 

The Parikh finite word automaton (PA) was introduced and studied by Klaedtke and Ruefi fT^ - 
Natural variants of the PA arise from viewing a PA equivalently as an automaton that keeps 
a count of its transitions and semilinearly constrains their numbers. Here we adopt this view 
and define the affine PA (APA), that extends the PA by having each transition induce an affine 
transformation on the PA registers, and the PA on letters (LPA), that restricts the PA by forc- 
ing any two transitions on same letter to affect the registers equally. Then we report on the 
expressiveness, closure, and decidability properties of such PA variants. We note that determin- 
istic PA are strictly weaker than deterministic reversal- bounded counter machines. We develop 
pumping-style lemmas and identify an explicit PA language recognized by no deterministic PA. 
Our findings and the resulting overall picture are tabulated in our concluding section. 



1. Introduction 

Adding features to finite automata in order to capture situations beyond regularity has been 
fruitful to many areas of research, in particular model checking and complexity theory below 
NC^ (e.g., [IZl EI]). One such finite automaton extension is the Parikh automaton (PA): A 
PA [16] is a pair {A, C) where C is a semilinear subset of N'^ and A is a finite automaton 
over (S X D) for S a finite alphabet and D a finite subset of N"^. The PA accepts the word 
Wi ■ ■ ■ w„ G S* if y4 accepts a word {wi,Vi) ■ ■ ■ {wn, v^) such that Y^Vi & C. Klaedtke and Ruefi 
used PA to characterize an extension of (existential) monadic second-order logic in which the 
cardinality of sets expressed by second-order variables is available. 



Here we carry the study of Parikh automata a little further. First we introduce related models 
of independent interest, each involving a finite automaton A and a constraint set C of vectors. 
(The main text has formal definitions.) (1) Constrained automata (CA) are defined to accept 
a word w G S* iff the Parikh image of some accepting run of A on w (i.e., the vector recording 
the number of occurrences of each transition along the run) belongs to C. (2) Affine Parikh 
automata (APA) generalize PA by allowing each transition to perform a linear transformation 
on the d-tuple of PA registers prior to adding a new vector; an APA accepts a word w iff some 
accepting run oi A on w maps the all-zero vector to a vector in C . (3) Parikh automata on 
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letters (LPA) restrict PA by imposing the condition that any transition on (a, -u) G (S x D) 
and any transition on {b,v) G (S x D) must satisfy u = v when a = b. 

Then our main observations are the following: 

• CA and deterministic CA respectively capture the class £pA of PA languages and the 
class Z^DetPA of deterministic PA languages. 

• The language {a, b}* ■ {a^^a"' \ n E N} belongs to Cpa \ -^DetPA; these two classes were 
only proved different in [16j. 

• APA and deterministic APA over Q are no more powerful than the same models over N. 

• APA express more languages than PA, and only context-sensitive languages; moreover 
the emptiness problem for deterministic APA is already undecidable. 

• Languages of LPA are equivalent to regular languages with a constraint on the Parikh 
image of their words. 

• Refining [16j slightly, we compare our models with the reversal-bounded counter machines 
(RBCM) defined by Ibarra [12], and show that /^oetPA is a strict subset of the languages 
expressed by deterministic RBCM. 

• Further expressiveness properties, closure properties, decidability properties and compar- 
isons between the above models are derived. The overall resulting picture is summarized 
in tabular form in Section [6l 

2. Preliminaries 

We write Z for the integers, N for the nonnegative integers, N"*" for N \ {0}, Q for the rational 
numbers, and Q"*" for the strictly positive rational numbers. We use IK to denote either N or 
Q. Let d,d' G N"*". Vectors in K"' are noted with a bar on top, e.g., v whose elements are 
Vi, . . . ,Vd. For C (^W^ and D C K"' , we write CD for the set of vectors in W^'^'^ which are the 
concatenation of a vector of C and a vector of D. We write G {0}'' for the all-zero vector, and 
el G {0, l}'^ for the vector having a 1 only in position i. We view K'' as the additive monoid 
(IC^, +). For a monoid (M, ■) and S C M, we write S* for the monoid generated by S, i.e., the 
smallest submonoid of (M, ■) containing S. A subset E of is K-definable if it is expressible 
as a first order formula which uses the function symbols +, Ae with e G K corresponding to 
the scalar multiplication, and the order <. More precisely, a subset E of K'' is K-definable iff 
there is such a formula with d free variables, with (xi, . . . , Xd) E E K. \= (j){xi, . . . , Xd). Let 
us remark that N-definable sets are the Presburger-definable sets and they coincide with the 
semilinear sets [9J, i.e., finite unions of sets of the form {oq + k{a{ -|- ■ ■ ■ -|- fc„a^ | (Vi)[fcj G N]} 
for some aj's in N'^. Moreover, Q-definable sets are the semialgebraic sets defined using affine 
function^ [6j Corollary 1.7.8]. 

Let S = {oi, . . . , Qn} be an (ordered) alphabet, and write e for the empty word. The Parikh 
image is the morphism $: S* — )■ N*^ defined by $(aj) = e7, for 1 < i < n. A language L C S* 
is said to be semilinear if ^{L) = {^{w) \ w E L} is semilinear. The commutative closure of a 
language L is defined as the language c{L) = {w \ ^{w) G A language L C S* is said 

^Semialgrebraic sets defined using affine functions are sometimes also called semilinear (e.g., |6J). In this 
paper, we use "semilinear" only for N-definable sets. 
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to be bounded if there exist n > and Wi, . . . ,Wn G S"*" such that L C wl - ■ ■ w*. Two words 
G S* are equivalent by the Nerode relation (w.r.t. L), if for all w G S*, uw G L ■<=^ vw G L. 
We then write u =l v (or u = v when L is understood), and write [u]l for the equivalence class 
of u w.r.t. the Nerode relation. 

We then fix our notation about automata. An automaton is a quintuple A = (Q, S, (5, go, -P") 
where Q is the finite set of states, S is an alphabet, ^CQxSxQis the set of transitions, 
qo & Q is the initial state and F O Q are the final states. For a transition t & 6, where 
t = {q, a, q'), we define From(t) = q and To{t) = q'. Moreover, we define fi^: 6* S* to be the 
morphism defined by fiA{t) = a, and we write /i when A is clear from the context. A path on A 
is a word vr = ti ■■■/!:„ G 5* such that To(tj) = From(tj+i) for 1 < i < n; we extend From and To 
to paths, letting From(7r) = From(ti) and To(7r) = To(t„). We say that /i(7r) is the label of it. 
A path 71 is said to be accepting if From(7r) = go and To(7r) G F; we let Run(y4) be the language 
over S of accepting paths on A. We then define L{A), the language of A, as the labels of the 
accepting paths. 

3. Parikh automata 

The following notations will be used in defining Parikh finite word automata (PA) formally. Let 
S be an alphabet, d G N"*", and D a finite subset of N'^. Following [16j, the monoid morphism 
from (S X D)* to S* defined by {a,v) i— )■ a is called the projection on S and the monoid 
morphism from (S x D)* to N'^ defined by {a,v) i-> tJ is called the extended Parikh image. 

Remark. Let S = {ai, . . . , a„} and D C N". If a word u G (ExD)* is in {(oj, e^) | 1 < i < n}*, 
then the extended Parikh image of u is the Parikh image its projection on S. 

Definition 1 (Parikh automaton pSJ). Let S be an alphabet, d G N'*', and D a finite subset 
of N'^. A Parikh automaton (PA ) of dimension d over S x D is a pair [A, C) where A is a finite 
automaton over S x D, and C C N'^ is a semilinear set. The PA language, written L{A, C), is 
the projection on S of the words of L{A) whose extended Parikh image is in C. The PA is said 
to be deterministic (DetPA) if for every state q oi A and every a G S, there exists at most one 
pair (q'jv) with q' a state and U G -D such that (g, {a,v), q') is a transition of A. We write £pa 
(resp. ^DetPA) for the class of languages recognized by PA (resp. DetPA). 

An alternative view of the PA will prove very useful. Indeed we note that a PA can be viewed 
equivalently as an automaton that applies a semilinear constraint on the counts of the individual 
transitions occurring along its accepting runs. To explain this, let {A, C) be a PA of dimension 
d, and let 5 = {ti, . . . , t„} be the transitions of A. Consider the automaton B which is a copy 
of A except that the vector part of the transitions is dropped, and suppose there is a natural 
bijection between the transitions of the two automata. Let tt be a path in A; the contribution 
to the extended Parikh image of /i(7r) of the transition ti = {p, {a,Vi),q) is Vi] thus, knowing 
how many times ti appears in the path traced by vr in i? is enough to retrieve the value of the 
extended Parikh image of /i(7r). Now note that the bijection exists if no two distinct transitions 
ti,tj are such that t^ = {p,{a,Vi),q) and tj = {p,{a,Vj),q). However, if such ti and tj exist. 
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we can replace them hj t = {p, (a, e^+i), q), incrementing in the process the dimension of PA, 
and change C to C defined by (v,c) G C" -v^ (3cj)(3cj)[c = Ci + Cj Av + Ci.ui + Cj.Vj G C] 
without changing the language of the PA. It is thus readily seen that the following defines 
models equivalent to the PAI§ and the DetPA: 

Definition 2 (Constrained automaton). A constrained automaton (CA) over an alphabet E 
is a pair (A, C) where A is a finite automaton over S with d transitions, and C C N'' is a 
semilinear set. Its language is L{A^C) = {/u(7r) | tt G Run(A) A $(7r) G C}. The CA is said to 
be deterministic (DetCA) if A is deterministic. 

3.1. On the expressiveness of Parikh automata 

The constrained automaton characterization of PA helps deriving pumping-style necessary con- 
ditions for membership in £pA and in ^oetPA- 

Lemma 1. Let L G £pA- There exist p,i & N"^ such that any w E L with \w\ > £ can be 
written as w = uvxvz where: 

1. < \v\ < p, \x\ > p, and \uvxv\ < i, 

2. uv'^xz G L and uxv'^z G L. 

Proof. Let (A, C) be a CA of language L. Let p be the number of states in A and m be the 
number of elementary cycles (i.e., cycles in which no state except the start state occurs twice) 
in the underlying multigraph of A. Finally, let £ = p x (2m + 1). Now, let w G L such that 
\w\ > i and 71 G Run(yl) such that /^(vr) = w and $(7r) G C. Write tt as vri ■ • ■7C2m+iP where 
\TTi\ = p. By the pigeonhole principle, each VTj contains an elementary cycle, and thus, there 
exist 1 < i, j < m + 1 with i + 1 < j such that VTj and ttj share the same cycle 7]^ labeled with 
a word v. Write: 

• TTi as vri_i?7^7rj_2, and ttj as 7rj_i?7^7rj_2, 

• r]u for TTi ■ ■ ■ Tfi^iTTi^i and u for fi{i]u), 

• ?7a; for 7ri_2vri+i ■ • • T^j-iTTjA and x for /i(?7^), 

• rjz for 7rj^2TTj+i ■ ■ ■ T^i+ip and z for fi{riz). 

Then vr = f]uVvVxVvVz and w = uvxvz. Moreover, both tt' = rjufj'^rj^riz and it" = rju'rjxfj'^riz are 
accepting paths with the same Parikh image as vr. Thus, /i(7r') = uv'^xz G L and /i('/r") = 
uxv'^z G L. Moreover, < |f | < p, |x| > p and IwfXf | < £. □ 

A similar argument leads to a stronger property for the languages belonging to ^DetPA^ 

Lemma 2. Let L G -CDetPA- There exist p,i E such that any w over the alphabet of L with 
\w\ > i can be written as w = uvxvz where: 

1. 0<|f|<j9, |a;|>j9 and \uvxv\ < I, 

2. uv'^XjUvxv and uxv"^ are equivalent w.r.t. the Nerode relation of L. 



^Another equivalent view of PA languages suggested by one referee is as sets R''^{X) where i? is a rational 
relation over S* x N'' and A" is a rational subset of N''. An artificial further restriction to this viewpoint would 
serve to capture DetPA languages. 
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We apply Lemma [T] to the language COPY, defined as {w^^w \ w 6 {a,b}*}, as follows: 
Proposition 3. COPY ^ Cpa- 

Proof. Suppose COPY G Cpa- Let i,p be given by LemmalU s-nd consider w = {aPhY^{dPhY G 
COPY. Lemma [U states that w = uvxvz where uvxv lays in the first half of and s = uv'^xz G 
COPY. Note that x contains at least one h. Suppose w = a* for 1 < z < p, then there is a 
sequence of a's in the first half of s unmatched in the second half. Likewise, if v contains a 6, 
then s has a sequence of a's between two 6's unmatched in the second half. Thus s ^ COPY, 
a contradiction. Hence COPY ^ £pa. □ 

As Klaedtke and Ruefi show using closure properties, DetPA are strictly weaker than PA. The 
thinner grain of Lemma [2] suggests explicit languages that witness the separation of -CoetPA 
from £pA. Indeed, let EQUAL C {a, 6, be the language {a, b}* ■ {a^jj^a^ \ n G N}, we have: 

Proposition 4. EQUAL G £pa \ -^DetPA- 

Proof. We omit the proof that EQUAL G Cpa- Now, suppose EQUAL G i^DetPA, and let i,p 
be given by Lemma [21 Consider w = {oFhY- Lemma [2] then asserts that a prefix of w can be 
written as Wi = uvxv, and that W2 = uv'^x verifies wi = W2- As \x\ > p, x contains a b. Let 
k be the number of a's at the end of Wi. Suppose f = a* for 1 < i < p, then W2 ends with 
k — i < k letters a. Thus Wi^a^ G EQUAL and W2jj^a^ ^ EQUAL, a contradiction. Suppose 
then that v = a^ha^, with Q < i + k < p. Then W2 ends with p — i > k letters a, and similarly, 
^ W2, a contradiction. Thus EQUAL ^ /^oetPA- D 

For comparison, we mention another line of attack for the study of /^oetPA- The proof is 
omitted, but is based on the number of possible configurations of a PA, which is polynomial 
in the length of the input word. Klaedtke and Ruefi used a similar argument to show that 
PAL = {w^^w^ I w G {a, b}^}, where is the reversal of w, is not in £pA- 

Lemma 5. Let L G -CoetPA- Then there exists c > such that \{[w]l \ w G G 0{n'^). 
Proposition 6. Let L = {w E {a,b}* \ = b}, where Wi is the i-th letter of w. Then 

L G £pA \ -^^DetPA ■ 

Proof. We omit the proof that L G Cpa', the main point is simply to guess the position of the 
b referenced by \w\a- On the other hand, let n > and u,v E {a, 6}" such that l^la = \v\a = | 
and there exists p E {|, . . . ,n} with Up ^ Vp- Let w = a^~2, then {uw)\uw\a = (^^)|«|a+k|a = 
{uw)p = Up, and similarly, {vw)\vw\a = "^p- This implies uw ^ L ^ vw G L, thus u ^ v- Then 
for < i < ^, define Ei = {a^'^b'^z \ z G {a, 6)2 A \z\a = i}- For any u,v G [jEi with 

u ^ V, the previous discussion shows that u ^ v- Thus | w G {a,6}"}| > | [Jl^QEi\ = 

E/=o = E/=o (!) = 25 ^ 0(n^(i)). LemmaOthen implies that L ^ C^etPA- □ 

3.2. On decidability and closure properties of Parikh automata 

The following table summarizes decidability results for PA and DetPA. The results in bold are 
new, while the others are from \16l and [12j : 
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Proposition 7. (1) Finiteness is decidable for PA. (2) Inclusion is decidable for DetPA and 
undecidable for PA. (3) Regularity is undecidable for PA. 

Proof. (1). Let (v4, C) be a CA. Then Run(y4) is a regular language, and thus, its Parikh 
image is effectively semilinear (this is a special case of Parikh's theorem [20]). It follows that 
the language described by A and C is finite if and only if $(Run(74)) fl C is finite, which is 
decidable. (2). Decidability of inclusion for DetPA follows from the fact that jCoetPA is closed 
under complement and intersection, and that the emptiness problem is decidable for DetPA. 
(In fact, it is decidable whether the language of a PA is included in the language of a DetPA.) 
Undecidability of inclusion for PA follows immediately from the undecidability of the universe 
problem for PA. (3). This follows from a theorem of pT], which states the following: Let C be 
a class of languages closed under union and under concatenation with regular languages. Let P 
be a predicate on languages true of every regular language, false of some languages, preserved 
by inverse rational transduction, union with {e} and intersection with regular languages. Then 
P is undecidable in C. Obviously, £pa satisfies the hypothesis for C. Moreover, "being regular in 
£pa" is a predicate satisfying the hypothesis for P. Thus, regularity is undecidable for PA. □ 

We now further the study of closure properties of PA and DetPA started in [16j. The following 
table collects the closure properties of PA and DetPA, where /i is a morphism, c is the com- 
mutative closure. In bold are the results of the present paper, while the other results can be 
found in [16] (detailed proofs by Karianto can be found in |14|): 
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As the language EQUAL separating /^oetPA from £pa is the concatenation of a regular language 
and a language of /^oetPA, we have: 

Proposition 8. -CoetPA is not closed under concatenation. 

Proposition 9. (1) The commutative closure of any semilinear language is in iZoetPA- (2) 
/^DetPA i'S not closed under morphisms. 

Proof. (1). Let S = {ai,...,a„}, L C E* a semilinear language, and C = ^{L). Define A 
to be an automaton with one state, initial and final, with n loops, the i-th labeled (aj,ei) G 
S X {ei}i<i<n- Then c(L) = L[A, C). (2) is straightforward as any language of £pa is the image 
by a morphism of a language in ^oetPA- Indeed, say {A, C) is a CA and let B be the copy of 
A in which the transition t is relabeled t; then B is deterministic and L{A, C) = fiA{L{B, C)). 
This implies the nonclosure of i^DetPA under morphisms. □ 
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Note that (1) from Proposition [9] implies that both £pa and -CoetPA are closed under commu- 
tative closure, as both are classes of semilinear languages |16]. 

Proposition 10. Neither Cpa nor /^oetPA closed under starring. 

Proof. We show that the starring of L = {a"6" | n G N} is not in £pa- Suppose L* G £pA, and 
let w = {a^lfy, where i,p are given by Lemma [T] The same lemma asserts that w = uvxvz, 
such that, in particular, uv'^xz and uxv'^z are in L* . Now suppose f = a* for some i < p. Then 
uv'^x contains aP^'^V with no more 6's on the right. Thus uv'^xz ^ L*. The case for f = 6* is 
similar. Now suppose v = a^V with i,j > 0. Then uv'^x contains ■ ■ -d^lPa^lP ■ ■ ■ , but i < p, 
thus uv'^xz L* . The case v = Ifa^ is similar. Thus L* £pA- □ 

Remark. Baker and Book [1] already note, in different terms, that if £pA were closed under 
starring, it would be an intersection closed full AFL containing {a"6" | n > 0}, and so would 
be equal to the class of Turing-recognizable languages. Thus £pA is not closed under starring. 

3.3. Parikh automata and reversal-bounded counter machines 

Klaedtke and Ruefi noticed in [15] that Parikh automata recognize the same languages as 
reversal-bounded counter machines, a model introduced by Ibarra |12] : 

Definition 3 (Reversal-bounded counter machine [12]) • A one-way, k-counter machine M is 
a 5-uple {Q, S, 6, Qq, F) where Q is a finite set of states, S is an alphabet, 5 C Q x (E U {jj}) x 
{0, l}'^ X Q X {S,R} X {—1,0,+!}'^ is the transition function, go ^ Q is the initial state and 
F O Q is the set of final states. Moreover, we suppose tl ^ S. The machine is deterministic if 
for any {p, i, x), there exists at most one (g, h, v) such that {p, i, x, q, h, v) G 6. On input w, the 
machine starts with a read-only tape containing w^, and its head on the first character of w. 
Write Ci for the i-th counter, then a transition {p,i,x,q,h,v) G 5 is taken if the machine is in 
state p, reading character i and = if = and c, > if Xj = 1, for all i. The machine 
then enters state q, its head is moved to the right iS h = R, and v is added to the counters. 
If the head falls off the tape, or if a counter turns negative, the machine rejects. A word is 
accepted if an execution leads to a final state. The machine is reversal- bounded (RBCM) if there 
exists an integer r such that any accepting run changes between increments and decrements of 
the counters a (bounded) number of times less than r. We write DetRBCM for deterministic 
RBCM. We write £rbcm (resp. /^DetRBCM) for the class of languages recognized by RBCM 
(resp. DetRBCM). 

In |15i , Section A. 3], it is shown that PA have the same expressive power as (nondeterministic) 
RBCM. Although Fact 30 of |tl5j, on which the authors rely to prove this result, is technically 
false as statedjj the small gap there can be fixed so that: 

Proposition 11 ([15j). Cpa = Crbcm- 

■^Fact 30 of [TS] states the following. Consider a RBCM M which, for any counter, changes between increment 
and decrement only once. Let Af be M in which negative counter values are allowed and the zero-tests are 
ignored. Then a word is claimed to be accepted by M iff the run of M' on the same word reaches a final state 
with all its counters nonnegative. A counter-example is the following. Take A to be the minimal automaton for 
a*b, and add a counter for the number of a's that blocks the transition labeled b unless the counter is nonzero. 
This machine recognizes a'^b. Then by removing this test, the machine now accepts b. 
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Further, we study how the notion of determinism compares in the two models. Let NSUM = 
{a"<H6™i#6™2^---#6™^*c""i+-+™" I > n > A (V2)[m, G N]}: the number of a's is the 
number of mj's to add to get the number of c's. Note that NSUM is not context-free. Then: 

Proposition 12. ^DetPA S -^^DetRBCM and NSUM G /^DetRBCM \ -^DetPA- 

Proof. We first show that ^oetPA ^ -^DetRBCM- Let {A, C) be a CA, where A = {Q, S, 6, go, F) 
is deterministic and let 6 = {ti, . . . ,tk}- We define a DetRBCM of the same language in two 
steps. (1) First, let M be the /c-counter machine {Q U {g/}, S, (, go, g/), where qf ^ Q and C is 
defined by: 



This machine (trivially a DetRBCM) does not make any test, and accepts (in gj) precisely the 
words accepted by A. Moreover, the state of the counters in gj is the Parikh image of the path 
taken (in A) to recognize the input word. (2) We then refine M to check that the counter 
values belong to C. We note that we can do that as a direct consequence of the proof of |13[ 
Theorem 3.5], but this proof relied on nontrivial algebraic properties of systems Ay = b, where 
A is a matrix, y are unknowns and 6 is a vector; we present here an elementary proof. Recall 
that C can be expressed as a quantifier-free first-order formula which uses the function symbol 
+, the congruence relations =j, for i > 2, and the order relation < (see, e.g., |7]). So let C 
be given as such formula (pc with k free variables. Let (pc be put in disjunctive normal form. 
The machine M then tries each and every clause of (pc for acceptance. First, note that a term 
can be computed with a number of counters and reversals which depends only on its size: for 
instance, computing q + Cj requires two new counters x, y; Ci is decremented until it reaches 
0, while X and y are incremented, so that their value is q; now decrement y until it reaches 
while incrementing Cj back to its original value; then do the same process with Cj: as a result, 
X is now Ci + Cj. Second, note that any atomic formula (ti < t2 or ti =i ^2) can be checked by 
a DetRBCM: for ti < ^2, compute Xi = ti and X2 = ^2, then decrement Xi and X2 until one 
of them reaches 0, if the first one is Xi, then the atomic formula is true, and false otherwise; 
for ti =j t2, a simple automaton-based construction depending on i can decide if the atomic 
formula is true. Thus, a DetRBCM can decide, for each clause, if all of its atomic formulas (or 
negation) are true, and in this case, accept the word. This process does not use the read-only 
head, and uses a number of counters and a number of reversals bounded by the length of (pc- 

We now show that NSUM G /^DetRBCM \ ^DetPA- We omit the fact that NSUM G /^DetRBCM ■ 
Now suppose {A, C) is a DetPA such that L{A, C) = NSUM, with A = {Q,T,x D, S, go, F) also 
deterministic. We may suppose that the projection on S of L{A) is a subset of a*^lt{b*^)*b*Jltc* , 
so that there exist A; > 0, gi, . . . , g^ G Q, and j G {0, . . . , fc} such that (g,, (a, vt), g^+i) G 6, for 
< i < k and some tJ^'s, and (g^, {a,Vk),qj) G 6. Moreover, we may suppose that no other 
transition points to one of the g^'s, and that all transitions t = {qi,{i,v),q) G 6 such that 
g ^ {go, . . . , qk} are with £ = 4; let T be the set of all such transitions t. We define |T| DetPA 
such that the union of their languages is SUMN = {^wVa"' \ aP'ifiw G NSUM}, that is, the 
strings of NSUM with aP' pushed at the end. For t E T, define At as the automaton similar to A 
but which starts with the transition t and delay the first part of the computation until the very 



^= U ( {(^'^'^'^''^'^i) I = (^'«'^')} U {(g, tl,a;,g/,S',0) I g G F} 
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end. Formally, At = {QU {q'^}, E x 6t, q'o, {From(t)}) where 6t = {S\T)U {{q'^, fi{t), To(t)} U 
{(•?/) ('^! 0)5 '?o) I Qf ^ ^} with qQ a fresh state. Now for u G L{A), let t be the transition labeled 
4 taken when A reads u, and let u = uJifi{t)u)2. Then ^{t)u2{'-^,0)uJi G and this word 

has the same extended Parikh image as u. Thus we have that [J^^j. L{At,C) = SUMN, and 
if NSUM G i^DetPA, then SUMN G /^oetPA- A proof similar to Proposition H] then shows that 
SUMN ^ £DetPA, a contradiction; thus NSUM ^ ^oetPA- □ 

The parallel drawn between (Det)PA and (Det)RBCM allows transferring some RBCM and 
DetRBCM results to PA and DetPA. An example is a consequence of the following lemma 
proved in 2011 by Chiniforooshan et al. [5] for the purpose of showing incomparability results 
between different models of reversal-bounded counter machines: 

Lemma 13 ([5j). Let a DetRBCM express L C T,* . Then there exists w e T,* such that LHiuT,* 
is a nontrivial regular language. 

Variants of the language EQUAL from Proposition |4] can be shown outside Z^DetPA hi this way. 
For instance, for E = {a,b}, SANBN = S* ■ {a"6" | n G N} is such that any w G S* makes 
EANBNflwS* nonregular. Although LemmafT^thus gives languages in £pA\>CDetPA; Lemma [T5] 
seemingly does not apply to EQUAL itself since EQUAL fl #{a, b, 7^}* = {7^} is regular. 

4. AfRne Parikh automata 

A PA of dimension d can be viewed as an automaton in which each transition updates a vector 
X of N"^ using a function x x + v where v depends only on the transition. At the end of an 
accepting computation, the word is accepted if x belongs to some semilinear set. We propose to 
generalize the updating function to an affine function. We start by defining the model, and show 
that defining it over N is at least as general as defining it on Q. We study the expressiveness 
of this model, and show it is strictly more powerful than PA. We then note that deterministic 
such automata can be normalized so as to essentially trivialize their automaton component. 
We then study nonclosure properties and decidability problems associated with APA, leading 
to the observation that APA lack some desirable properties — e.g., properties usually needed 
for any real-world application. 

In the following, we consider the vectors in K"' to be column vectors. Let d,d' > 0. A function 
f : is a (total) affine function if there exist a matrix M G K"' and v E such 

that for any x G K.'^, f{x) = M.x + v; it is linear iiv = 0. We note such a function / = {M,v). 
We write J-"^ for the set of affine functions from K'^ to and view J-"^ as the monoid {J-"^, o) 
with ifog){x) = g{f{x)). 

Definition 4 (Affine Parikh automaton). A K.-affine Parikh automaton (K-APA) of dimension 
0? is a triple {A, U, C) where A is an automaton with transition set 5, t/ is a morphism from 5* 
to J-"^ and C C K'^ is a K-definable set; recall that U need only be defined on 6. The language 
of the APA is L{A,U,C) = {i^{tt) \ n G Run(A) A (f/(7r))(0) G C}. The K-APA is said to be 
deterministic (K-DetAPA) if A is. We write Ck-apa (resp. jCs-OetAPA) for the class of languages 
recognized by K-APA (resp. K-DetAPA). 
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Remark. It is easily seen that N-APA (resp. N-DetAPA) are a generalization of CA (resp. 
DetCA). Indeed, let {A, C) be a CA, and let $ be the Parikh image over the set S of transitions 
of A. Define, for t G 5, U{t) = {Id, $(t)) where Id is the identity matrix of dimension \6\ x \6\. 
Then L[A, C) = L{A, U, C); we will later see that this containment is strict. 

The arguments used by Klaedtke and Ruefi [15] apply equally well to K-APA and K-DetAPA, 
showing: 

Proposition 14. Z^k-Apa ond /^K-DetAPA effectively closed under union, intersection and in- 
verse morphisms. Moreover, -Ck-apa is closed under concatenation and nonerasing morphisms, 
and vCK-DetAPA is closed under complement. 

We now show these models over N are at least as powerful as over Q. First, we need the 
following technical lemma: 

Lemma 15. For any K-APA (resp. K-DetAPA) there exists a K-APA (resp. K-DetAPA) 
where the functions associated with the transitions are linear, except for some transitions which 
can be taken only as the first transition of a nonempty run. 

Proof (sketch). Let {A,U,C) be a K-APA of dimension d, where the transition set of A is 
6 = {ti, . . . ,t\s\}, and write U{ti) = {Mi^vt). Let A' be a copy of A in which a fresh state q is 
added, set to be the initial state, with the same outgoing transitions as the initial state of A 
and no incoming transition. Let t'l, . . . ,t'f^ be the new transitions in A', and order 5 such that 
ti, . . . ,tk are the corresponding transitions leaving the initial state of A. Now define U', for 
x,m,---,W\^^'^^^y U'it'i): (x , yl, . . . , yj^) h-^ (W,?^, • • • ,^^), and (f/'(ti): (x, ^, . . . , ^) 
{Mi.x + y-hW,---,W\)- Finally define C = CK^^I-^L Then L{A', U', C) = L{A, U, C), and A' 
is deterministic if A is. Moreover, the only nonlinear functions given by U' are for the outgoing 
transitions of the initial state of A', a state no run can return to. □ 

Proposition 16. >CQ_DetAPA ^ -^^N-OetAPA 'lIT'd Cq_apA ^ -^^N-APA- 

Proof. We first recall that a set C C Q"' is Q-definable iff it is a finite union of sets of the form: 

{x\ fi{x) = ■■■ = fpix) = A ^i(x) > A ■ ■ ■ A gq{x) > 0}, 

where fi, . . . , fp, gi, . . . , Qg-. — )► Q are affine functions (see, e.g., [6j). Let {A,U,C) be a 
Q-APA of dimension d; by Lemma [TSl we may suppose that the functions associated with the 
transitions are linear, except for the transitions that may begin a run. We suppose C is a 
single set of the kind previously described; this is no loss of generality as £k-apa and /^K-OetAPA 
are closed under union. So let C be described by functions /j and gi as above, and suppose 
d = p + q (we add constant functions to the /j's or O's to the vectors of C in order to do 
that). Define / : ^ Q'^ by f{x) = (/i(x), . . . , ^i(x), . . .); clearly, / G J'd(Q). Now let 
{A,U',C') be the Q-APA of dimension 2d, defined by {U'{t)){x,y) = {{U{t)){x), f{x)), with t 
a transition of A and x,y G Q*^; and C = Q'^.{0}p.(Q+)'?. Clearly, L{A,U',C') = L{A,U,C). 
We then define U" by U"{t) = c x U'{t) where c is the maximum denominator in the reduced 
fractions appearing in the matrix and vector of U'{t). Thus, the functions given by U" are 
from Z^'^ to Z^'^. Moreover, for any vr G Run(A), ([/"(7r))(0) = kx (f/'(7r))(0), for some A; G N+ 
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depending only on tt. Thus, defining C" = Z'^.{0}P.{Z+y, we have L{A, U", C") = L{A, U', C). 
Finally, the negative numbers can be circumvented by doubling the dimension of the matrices 
and keeping track of the negative and the positive contributions separately until the final tests 
for zero, which become tests that negative contribution equals (or is strictly lesser than) the 
positive contribution of a number (a similar technique is used by Klaedtke and Rue6 |15]). □ 

Remark. The previous proof shows that the constraint set of Q-APA can be simulated within 
the automaton, and is thus of a lesser use. 

We now give a large class of languages belonging to £((j_apa- Define A^n(-^) as the smallest 
semiAFL containing L and closed under intersection; that is, A^n(-^) is the smallest class of 
languages containing L and closed under nonerasing and inverse morphism, intersection with a 
regular set, union, intersection, and concatenation. With PAL = {wjj^w^ \ w G {a, fe}"*"}: 

Proposition 17. A^n(PAL) C £iq_apa- 

Proof. We sketch a Q-DetAPA for PAL. The automaton starts by reading a single letter, if it 
is an a it initializes its counters to (2, 1), otherwise, it initializes them to (2,0). Now for each 
letter read, if it is an a, it applies the function {p,v) i— )■ (2p, w +p), and (p, f) ^ {'^P,'v) if it 
is a b. Upon reaching the # sign, functions associated to a and b change: when reading an a, 
the automaton applies {p,v) h- {p/2,v —p/2), otherwise it applies {p,v) i— )■ {p/2,v). Clearly, a 
word is in PAL iff it is of the form {a, 6}^#{a, b}~^ and the final state of the counters is (1, 0). 
The closure properties are implied by those of £q_apa (Proposition [T^ . □ 

The class Mq{PAL) contains a wide range of languages. First, the closure of PAL under non- 
erasing and inverse morphism and intersection with regular sets is the class of linear languages 
(e.g., |1]). In turn, adding closure under intersection permits to express the languages of nonde- 
terministic multipushdown automata where in every computation, each pushdown store makes 
a bounded number of reversals (that is, going from pushing to popping) [3j; in particular, 
if there is only one such pushdown store, this corresponds to the ultralinear languages |10] . 
Further, as A^n(COPY) C A^n(PAL) (e.g., [3]) this implies that COPY e £q-apa. 

Next, we note that K-APA express only context-sensitive languages (CSL): 
Proposition 18. /^n apa ^ CSL. 

Proof. Let (A, f/, C) be an N-APA of dimension rf, we show that L(A, f/, C) e NSPACE[n] 
(which is equal to CSL |18]). Let A = {Q, S, 6, q^, F), and w = Wi ■ ■ ■ Wn ^ S*. First, initialize 
w and q i— qo. Iterate through the letters of w: on the i-th letter, choose nondetermin- 
istically a transition t from q labeled with Wi. Update v by setting v {U{t)){v) and q with 
q To{t). Upon reaching the last letter of w, accept w iS q & F and v ^ C. 

We now bound the value of v. Let c be the greatest value appearing in any of the matrices 
or vectors in U{t), for any t. For a given v, let maxv be max{fi, . . . ,Vd}. Then for any t, 
{iU{t)){v))i < d X {c X maxU) + c. Let tt be a path, we then have that ((f7(7r))(0))i < 
{c{d + l))"~^c, thus the size of v at the end of the algorithm is in 0{n). Now note that, as 
C is semilinear, the language of the binary encoding of its elements is regular [22], and thus, 
checking w G C can be done efficiently. Hence the given algorithm is indeed in NSPACE[n]. □ 



12 



M. Cadilhac, A. Finkel, and P. McKenzie 



We now note that the power of K-DetAPA does not owe to their capabihties as automata: 

Proposition 19. Let S be an alphabet. There exists a two-state automaton Aj^ such that for 
any K-DetAPA over S, there exists a K-DetAPA accepting the same language whose underlying 
automaton is A-£. 

Proof. Let {A,U,C) be a K-DetAPA of dimension d where A = (Q, E, 5, go, -^), with Q = 
{1, . . . ,k} and S = {ai, . . . , a^}- Let = k{d-\-l), we show that there exist Z^^, . . . , /^^ G J-"^, 
a K-definable set G C and o G such that: 

w = h---i\^\eL{A,U,C) ^ /,|^|0...o/,^(o)gG. (1) 

Our goal is to represent the state in which the K-DetAPA is with a vector of size A^. This 
vector is composed of k smaller vectors of size {d + 1). On taking a path vr in A, let q = To(7r) 
and V = (f/(7r))(0'^); then q and v describe the current configuration of the K-DetAPA. Thus 
we define, for any g G Q and v G K'^: Vec(g,t;) = (0^^+^ • ■ ■ 0^^+^ 0^^+^ ■ ■ ■ 0"^+^). 

g-th subvector 

Now, for t G 5, let Mf and bf be such that U{t) = {Mt,bt). For the purpose of describing the 
matrix Ua below, when t ^ 6 we let stand for the all-zero matrix of dimension d x d and 
bt be the all-zero vector of dimension d. Let x be the characteristic function of 6. For a G S, 
define: 



Ua = 



( x((i,«,i)) 


0---0 




X{{k,a, 1)) 


0---0 


\l,a,l) 


^(l,a,l) 




b{k,a,l) 


M(k,a,l) 














0---0 




X{{k,a,k)) 


0---0 


5(l,a,fc) 

V 






b{k,a,k) 


M(^k,a,k) 



The matrix Ua is such that for (p, a,q) G S and v G MJ^, f/a.Vec(p, tJ) = Vec(g, M(^p^a,q)-v + b(j,^a,q)) ■ 
In other words, Ua computes the transition function and, according to the current state, applies 
the right affine function. More generally, for a path n in A starting at go and labeled by 
w = ii- ■ -fi^i, we have f/^,^, ■ ■ ■ f/£,.Vec(go, O'^) = Vec(To(7r), ([/(7r))(0^)), where 0^ is the all- 
zero vector of dimension d. We then let G be the K-definable set which contains Vec(g,tJ) iff g 
is final and v E C: G = IJieF Uugc ^^'-(*' 

Now let fai G J-'^ be defined as 0^) and let o = Vec(go,0°'). Then we have precisely 
Equation ([1]). Now let A' be the automaton ({r, s}, S, 6', r, {r, s}) defined by 6' = {r, s} x S x 
{s}. Define U' : 5'* J^^ by: 
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U'iiq,a,,q'))ix) 




Finally, a special case should be added for the empty word: We let C = G ii e ^ L{A, U, C) 
and C = GU {0^} otherwise. We have that (A', f/', C) is a K-DetAPA where A' has only two 
states, and it is of the same language as (A, f/, C). Finally, note that we need two states, and 
not one, because K-APA use as the starting value for their registers but o is needed here. □ 

We now give some negative properties of APA; our main tool is the following lemma: 

Lemma 20. Let L he a Turing-recognizable language. Then there exist effectively Li,L2 G 
jC(Q_DetAPA; (ifid a morphism h such that L = h{Li fl L2). 

Proof. This follows closely [H Theorem 1] , thus we only sketch the proof. Let M be a one-tape 
Turing machine, and suppose w.l.o.g. that M makes an odd number of steps on any accepting 
computation and that M only halts on accepting computation. Let Li be the set of strings 



such that the /A's are instantaneous descriptions of configurations of M, IDq is an initial 
configuration, ID2k+i is an accepting configuration, and for all i, /i?2i+i is the configuration 
which would be reached in one step from configuration ID2i. Similarly, L2 is the same as Li but 
checks that ID2i is the successor of ID2i-i. These languages are in i2(Q_DetAPA; using a technique 
similar to Proposition [T71 Thus Li fl L2 is a language of £Q_DetAPA which encodes the strings 
of the type of [2] such that the /A's encode an accepting computation of M. Now if each string 
/A, i > 0, is over an alphabet which is disjoint from the alphabet which encodes the initial 
instantaneous description, then the morphism h which erases all of the symbols in a string of 
Li n L2 except those representing the input is such that L{M) = h{Li fl L2). □ 

Corollary 21. Neither C^.xy'K nor C^.DetKPK is closed under morphisms. 
Corollary 22. The emptiness problem is undecidable for DetAPA. 

Proof. Let L C S* be a Turing-recognizable language, and x G S*. Let Li,L2,h be given by 
Lemma [201 for L. Then x G L iff Lir\L2r\h~^{x) is nonempty, the latter being in £Q_DetAPA- CH 

Recall that £k-Apa is closed under concatenation. The previous property and the fact that a 
language L is empty iff L ■ S* is finite implies: 

Corollary 23. Finiteness is undecidable for K-APA. 
5. Parikh automata on letters 

The PA on letters requires that the "weight" of a transition only depend on the input letter from 
S triggering the transition. In a way similar to the CA characterization of PA, we characterize 
PA on letters solely in terms of automata on S and semilinear sets. This model helps us in 
proving a standard lemma in language theory, in the context of PA. 



/Z)0#/A# ■ ■ ■ i^ID2k${ID2k+l)''# ■ ■ ■ 



R 



(2) 
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Definition 5 (Parikh automaton on letters). A Parikh automaton on letters (LPA) is a PA 
{A, C) where whenever (a,IJr) and (a, 1)2) are labels of some transitions in A, then vl = V2- We 
write £lpa (resp. ^DetLPA) for the class of languages recognized by LPA (resp. LPA which are 
DetPA). 

Now let (y4, C) be a LPA. We may determinize A in the standard way and, although this is not 
the case with a PA, the resulting LPA is deterministic, thus: 

Proposition 24. £lpa = ^^DetLPA- 

For i? C S* and C C N'^I, define R\c= {w eR \ ^w) G C}. Then: 

Proposition 25. Let L C S* he a language. The following are equivalent: 
(i) L e £lpa; 

(a) There exist a regular language C S* and a semilinear set C C N'^' such that R\c = L. 

The following property will be our central tool for showing nonclosure results: 
Lemma 26. Let L G >Clpa- For any regular language E: 

L r\ E is not regular {3w e E)[c{w) H L = ^. 

Proof. Let i? C S* be a regular language and C C N'^' be a semilinear set. Define L = R\c ■ 
Let £^ be a regular language such that Lr\E is not regular. As L (1 R^ we have {LCiE) C [RnE). 
The left hand side being non regular, those two sets differ. Thus, let w ^ [RCl E) such that 
w ^ L n E, we have w ^ L. Hence, w G {R\ L), which implies that ^ C, and in turn, 

c(w)nL = 0. □ 

Remark. Lemma [26] holds with, e.g., "context-free" in lieu of "regular", but the version given 
will suffice for our purposes. 

Proposition 27. (1) £lpa is not closed under union, complement, squaring, nonerasing mor- 
phisms; 

(2) £lpa is closed under intersection, inverse morphisms, commutative closure. 

Proof. (1). (Union.) Let Li = {w E {a, &}* | \w\a = ju^lb} and L2 = b{aUb)* be two languages 
of LPA. Suppose L = Li U L2 G Ci^pa- Let E be the regular language {a~^b~^). By the pumping 
lemma, LnE is not regular, thus Lemma [26] states there exists w ^ E such that c{w) fl L = 0. 
But u = 6''^''a'"''" G c{w) and u E L, a contradiction. 

(Complement.) Note that L is the complement in {a, b}* of {a^fe" | m > A m 7^ n}, which is 
the language of a LPA. 

(Squaring.) Let L = {a^fe" \ m ^ n} E £lpa- Suppose G £lpa, and let E = {a'^b'^Y. 
Again, LnE is not regular. Lemma [26] implies there exists w E E such that c{w) Hi? = 0. But 
^\w\aij0^ojy\w\t g ^^yj^ n L, a contradiction. 

(Nonerasing morphisms.) We simply note that L is the image of the language {a™6^a2&2 I ^ 7^ 
n A r 7^ s} by the morphism h{ai) = a, h{bi) = b. 
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(2). The proofs for the first two properties follow the usual proofs for finite automata. Closure 
under the commutative closure operator follows from the proof of Proposition O □ 

Finally, we use LPA to show the following property, which has a standard form known to be 
true for regular [19j and context-free languages [2] (the latter recently reworked in [8]). This 
property is sometimes called Parikh-houndedness: 

Proposition 28. For any L G £pA; there exists a hounded language L' G £pA such that L' C L 
and = <I>(L'). 

Proof. Let (A, C) be a constrained automaton, where 5 is the transition set of A. Let R C S* 
and D C N'"^' be such that fi{R\D) = L{A,C). As mentioned, we can find a bounded regular 
language R' C R such that = $(-R). In particular, (^{R'\d) = ^{R\d)- Closure under 

morphism of £pa implies that L = ^{R'Id ) is a bounded language of £pa included in L{A, C). 
Moreover, $(L(A, C)) = <l>(/i(i?tz) )), and thus, equals □ 



6. Conclusion 



The following table summarizes the current state of knowledge concerning the PA and its vari- 
ants studied here; a class contains the class below it, and a language witnessing the separation 
is attached to the top class when we know this containment to be strict. 



Context-Sensitive Languages 



PA = RBCM 




DetRBCM 
DetPA 



PAL 
COPY 
SANBN 
NSUM 



An intriguing question is whether there are context-free or context-sensitive languages outside 
£n_apa- How difficult is that question? How about -Cfsj-DetAPA? We have been unable to locate 
the latter class meaningfully. In particular, can /^N-OetAPA be separated from £n-Apa? 



The following summarizes the known closure and decidability properties for PA variants, and 
proposes open questions: 
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Several questions thus remain open concerning the poorly understood (and possibly overly 
powerful) affine PA model. But surely we expect testing a LPA or a DetPA for regularity to 
be decidable. How can regularity be tested for these models? One avenue for future research 
towards this goal might be characterizing vCoetPA along the lines of algebraic automata theory. 

Acknowledgments. The first author thanks L. Beaudou, M. Kaplan, and A. Lemaitre. 
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